Default Clustering from Sparse Data Sets

نویسندگان

Julien Velcin

Jean-Gabriel Ganascia

چکیده

Categorization with a very high missing data rate is seldom studied, especially from a non-probabilistic point of view. This paper proposes a new algorithm called default clustering that relies on default reasoning and uses the local search paradigm. Two kinds of experiments are considered: the first one presents the results obtained on artificial data sets, the second uses an original and real case where political stereotypes are extracted from newspaper articles at the end of the 19th century.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling Default Induction with Conceptual Structures

Our goal is to model the way people induce knowledge from rare and sparse data. This paper describes a theoretical framework for inducing knowledge from these incomplete data described with conceptual graphs. The induction engine is based on a non-supervised algorithm named default clustering which uses the concept of stereotype and the new notion of default subsumption, the latter being inspir...

متن کامل

Clustering of Conceptual Graphs with Sparse Data

This paper gives a theoretical framework for clustering a set of conceptual graphs characterized by sparse descriptions. The formed clusters are named in an intelligible manner through the concept of stereotype, based on the notion of default generalization. The cognitive model we propose relies on sets of stereotypes and makes it possible to save data in a structured memory.

متن کامل

Default Clustering with Conceptual Structures

This paper describes a theoretical framework for inducing knowledge from incomplete data sets. The general framework can be used with any formalism based on a lattice structure. It is illustrated within two formalisms: the attribute-value formalism and Sowa’s conceptual graphs. The induction engine is based on a non-supervised algorithm called default clustering which uses the concept of stereo...

متن کامل

Multi-rank Sparse Hierarchical Clustering

There has been a surge in the number of large and flat data sets – data sets containing a large number of features and a relatively small number of observations – due to the growing ability to collect and store information in medical research and other fields. Hierarchical clustering is a widely used clustering tool. In hierarchical clustering, large and flat data sets may allow for a better co...

متن کامل